Transcript mapping for handwritten Arabic documents
نویسندگان
چکیده
Handwriting recognition research requires large databases of word images each of which is labeled with the word it contains. Full images scanned in, however, usually contain sentences or paragraphs of writing. The creation of labeled databases of images of isolated words is usually tedious, requiring a person to drag a rectangle around each word in the full image and type in the label. Transcript mapping is the automatic alignment of words in a text file with word locations in the full image. It can ease the creation of databases for research. We propose the first transcript mapping method for handwritten Arabic documents. Our approach is based on Dynamic Time Warping (DTW) and offers two primary algorithmic contributions. First is an extension to DTW that uses true distances when mapping multiple entries from one series to a single entry in the second series. Second is a method to concurrently map elements of a partially aligned third series within the main alignment. Preliminary results are provided.
منابع مشابه
Holistic Approach for Classifying and Retrieving Personal Arabic Handwritten Documents
This paper presents a novel holistic technique for classifying and retrieving Arabic handwritten text documents. The retrieval of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several features are extracted from these connected parts and then co...
متن کاملTranscript mapping for handwritten Chinese documents by integrating character recognition model and geometric context
Creating document image datasets with ground-truths of regions, text lines and characters is a prerequisite for document analysis research. However, ground-truthing large datasets is not only laborious and time consuming but also prone to errors due to the difficulty of character segmentation and the large variability of character shape, size and position. This paper describes an effective reco...
متن کاملClassification of Personal Arabic Handwritten Documents
This paper presents a novel holistic technique for classifying Arabic handwritten text documents. The classification of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several structural and statistical features are extracted from these connected ...
متن کاملTranscript mapping for historic handwritten document images
There is a large number of scanned historical documents that need to be indexed for archival and retrieval purposes. A visual word spotting scheme that would serve these purposes is a challenging task even when the transcription of the document image is available. We propose a framework for mapping each word in the transcript to the associated word image in the document. Coarse word mapping bas...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007